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BACKGROUND OF THE INVENTION 



1. Field of the Invention 

This invention relates to implementing spatial sound in systems that 
enable a person to participate in an audio conference with other people across 
a network. Specifically, this invention relates to a system that increases the 
comprehensibility of one or more speakers to enhance a participant's ability to 
listen to a specific speaker when multiple persons are talking, to aid in the 
identification of a speaker by using spatial location cues, and to decrease the 
perception of background noise. This invention also relates to providing 
spatial sound in an audio or audiovisual conference, a long distance learning 
system, or a virtual reality environment. 

2. Discussion of the Related Technologv 

Spatial sound can be produced using a head-related transfer function. 
Head-related transfer functions have been estimated using dummy heads 
replicating a human head. Due to the shape of the pinna and the human head, 
microphones placed at the ear locations of a dummy head pick up slightly 
different sound signals. Differences between these soimd signals provide 
spatial location cues for locating a sound source. Several dummy heads, some 
complete with ears, eyes, nose, mouth, and shoxilders, are pictured in Durand 
R. Begault, 3-D Sound for Virtual Reality and Multimedia, 148-53 (1994) 
(Chapter 4: Implementing 3-D Sound). U.S. Patent 5,031,216 to Gorike, et al. 
proposes a partial dummy head having only two pinna replicas mounted on a 
rotate/tilt mechanism. These dummy heads are used in recording studios to 
manufacture binaural stereo recordings; they are not used in a teleconference 
environment. 

In teleconference environments, integrated services digital network 
(ISDN) facilities are increasingly being implemented. ISDN provides a 
completely digital network for integrating computer, telephone, and 



175034 



Attorney Docket No. 414.013 

communications technologies. ISDN is based partially on the standardized 
structure of digital protocols as developed by the International Telegraph and 
Telephone Consultative Committee (CCITT, now ITU-T), so that, despite 
implementations of multiple networks within national boundaries, from a 
user's point of view there is a single uniformly accessible worldwide network 
capable of handling a broad range of telephone, facsimile, computer, data, 
video, and other conventional and enhanced telecommunications services. 

An ISDN customer premise can be interconnected with a local exchange 
(local telephone company) to an ISDN switch. At the customer premise, an 
"intelligent" device, such as a digital PBX, terminal controller, or local area 
network, can be connected to an ISDN termination. Non-ISDN terminals may 
be connected to an ISDN termination through a terminal adapter, which 
performs D/A and A/D conversions and converts non-ISDN protocols to ISDN 
protocols. Basic rate ISDN provides several channels to each customer 
premise, namely a pair of B-channels that each carry 64 kilobits per second 
(kbs) of data, and a D-channel that cairies 16 kbs of data. Generally, the B- 
channels are used to carry digital data such as pulse code modulated digital 
voice signals. Usually, data on the D-channel includes call signalling 
information to and from the central office switch regarding the status of the 
customer telephone, e.g., that the telephone has gone off-hook, control 
information for the telephone ringer, caller identification data, or data to be 
shown on an ISDN telephone display. 

Additionally, an Advanced Intelligent network (AIN) has been developed 
that overlays ISDN facilities and provides a variety of service features to 
customers. Because an AIN is independent of ISDN switch capabilities, AIN 
services can easily be customized for individual users. U.S. Patent 
Nos. 5,418,844 and 5,436,957, the disclosure of which is incorporated by 
reference herein, describe many features and services of the AIN. 

In a teleconference environment, several methods have been suggested 
to transmit sound with varying degrees of sound source location information. 
U.S. Patent 4,734,934 to Boggs, et al. proposes a binaural teleconferencing 
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system for participants situated at various locations. Each participant has a 
single microphone and a stereo headset, and a conference bridge connects the 
participants together. A monaural audio signal from each participant's 
microphone is transmitted to the conference bridge. The conference bridge 
adds time delays to the audio signal to produce an artificial sound source 
location ambience. The time delays added to each incoming monaural signal 
simulate the location of conference participants as being in a semi-circle around 
a single listener. The conference bridge then transmits the delayed signals to 
the conference participants. This system uses a simple time delay to simulate 
different locations for conference participants; it does not use head-related 
transfer functions to create spatial sound signals representing the virtual 
location of each conference participant. 

U.S. Patent 5,020,098 to Celli proposes using left and right microphones 
for each participant that transmit a digitized audio signal and a phase location 
information signal to a conference bridge across ISDN facilities. The 
conference bridge then uses the transmitted location information to control the 
relative audio signal strengths of loudspeakers at the other participants' 
stations to simulate a position in the station for each remote participant. 
Again, this system does not use head-related transfer functions to place 
conference participants in different virtual locations. 

U.S. Patent 4,815,132 to Minami proposes a system for transmitting 
sound having location information in a many-to-many teleconferencing 
situation. This system includes right and left microphones that receive audio 
signals at a first location. Based on the differences between the right and left 
audio signals received by the microphones, the system transmits a single 
channel and an estimated transfer function across ISDN facilities. At a 
receiving location, the right and left signals are reproduced based on the single 
channel signal and the transfer function. Afterwards, the reproduced signals 
are transmitted to right and left loudspeakers at the receiving station. This 
system also does not use head-related transfer functions to create a virtual 
location for each conference participant. 
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None of these described systems use head-related transfer functions in 
a teleconference environment. Thus, these systems do not truly produce 
spatial sound to place conference participants in a virtual location for ease in 
identifying speakers and distinguishing speech. 

SUMMARY OF THE INVENTION 

The spatial sound conference system enables participants in a 
teleconference to distinguish between speakers even during periods of 
interruption and overtalk, identify speakers based on virtual location cues, 
understand low volume speech, and block out background noise. Spatial sound 
information may be captured using a dummy head at a conference table, or 
spatial sound information may be added to a participant's monaural audio 
signal using head-related transfer functions based on an assigned virtual 
location of a speaker. Spatial sound signals may be reproduced on spatially 
disposed loudspeakers preferably positioned near the ears of a listener. The 
spatial sound conference system is designed to enable conferences across a 
digital network. Aside from purely audio conferences, the system can provide 
spatial sound to audiovisual conferences, long distance learning systems, or 
virtual reality environments implemented across a network. 

Head-related transfer functions simulate the frequency response of audio 
signals across the head from one ear to the other ear to create a spatial 
location for a sound. A computer-generated head-related transfer function 
convolved with a single audio signal creates left and right audio signals with 
a spatial sound component. Head-related transfer functions may also be 
created by recording left and right audio signals at the ears of a human head 
or a dummy head. By inserting a spatial sound component in a teleconference, 
either using a dummy head or spatial sound conference bridge having head- 
related transfer functions, a speaker other than the loudest speaker may be 
heard during periods of interruption and overtalk. Additionally, speakers may 
be more readily identified when they have a virtual location as established 
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using spatial sound, and the perception of background noise is reduced. The 
term "speaker" as used herein is not Umited to an individual talking, but may 
be any audio source having an actual or assigned virtual location relative to a 
listener or another speaker. 

5 BRIEF DESCRIPTION OF THE DRAWINGS 



Figure 1 shows a schematic of a spatial sound conference system using 
a dummy head in a conference room that transmits spatial sound to a 
participant at a remote location across ISDN facilities. Figure lA shows a 
schematic of a many-to-many spatial sound conference using two dummy heads 

10 in two conference rooms. 

Figure 2 shows a schematic of a spatial sound conference bridge used in 
a spatial sound conference system. Figures 2A, 2B, 2C, and 2D show an 
example of virtual positions of conference participants. Figures 2E, 2F, 2G, 
and 2H show another example of virtual position of conference participants. 

15 Figure 3 shows a schematic of a spatial sound conference system 

implemented using a spatial sound conference bridge across ISDN facilities, 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

Figure 1 shows a schematic of a spatial sound conference system using 
a dummy head in a conference room that transmits spatial sound to a 

20 participant at a remote location across ISDN facilities. Inside conference room 
station 100 is a dxunmy head 101 having at least two spatially disposed 
microphones 103, 105 placed at the right and left ear locations. The dummy 
head 101 may also contain a loudspeaker 107 at the mouth location, or a 
loudspeaker may be placed near the dunmiy head. The dunrniy head may also 

25 include shoulders or a torso. Advantageously, the dummy head may be placed 
directly on conference table 120 or on a chair in the conference room station 
or otherwise spatially situated at a conference location. Other conference 
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participants may be situated about the conference room station, preferably 
equally spaced around conference table 120. According to an advantageous 
feature, the speciaUzed equipment at conference room station 100 may be kept 
to a minimum. Alternatively, the conference room station may be designed as 
rooms for audiovisual conferences, long distance learning system classrooms, 
or virtual reality booths with the attendant equipment necessary for such 
applications. 

The preferred embodiment is described in an ISDN environment; 
however, the invention may be implemented with other digital or analog 
communication channels as long as such channels can adequately handle the 
signal transmissions. In addition, various compression techniques can be used 
to reduce the transmission loads for such communication channels. 

The spatially disposed microphones 103, 105 in the dummy head pickup 
audio signals including the speech of the teleconference participants in 
conference room station 100. Because of the physical configuration of the 
dummy head and the spatially disposed placement of the left and right 
microphones, the differences between the left and right microphone signals 
captures the spatial components of the soimds in the conference room 100. 

In a preferred embodiment, a terminal adapter 128 converts the left and 
right microphone signals to digital data and sends the data across ISDN 
channels to ISDN facilities 150 that include ISDN switches 140, 160. Other 
digital or high bandwidth commimication networks such as ADSL, a video 
network, or a fiill-service network, however, may be used to transmit signals 
between conference room station 100 and remote participant station 199. The 
two B channels of ISDN are capable of transmitting a bandwidth of 64 kbs 
each. Thus, the right microphone signal may be transmitted on one of the B 
channels, and the left microphone signal may be transmitted on the other B 
channel. 

A compression unit 224 may apply standard compression algorithms, 
such as ISO MPEG Layer II or III or other compression algorithms compUant 
with CCITT (now ITU-T) standards G.722 or G.711, to the data signals to 
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conform to the bandwidth restrictions of the communication network. If a 
communication network with a larger bandwidth is available, different 
compression algorithms may be used or compression may not be necessary. 
Telos Systems of Cleveland, Ohio creates a single unit housing an ISDN 
terminal adapter and a MPEG compression and decompression unit, which 
may be used in various embodiments of the spatial sound conference system. 

At the receiving end of the network, the right and left signals are 
transmitted to a remote participant station 199 and the digital signals are 
decompressed using decompression unit 225 and converted back to analog 
using terminal adapter 129. Remote participant station 199 has spatially 
disposed loudspeakers 113, 115 such as a stereo headset or stereo loudspeakers 
for positioning close to the ears of a remote participant. The stereo 
loudspeakers may be embedded in a chair at the remote participant station. 
The spatial sounds reproduced by the loudspeakers allow a listener to 
distinguish speech from backgroxmd noise more easily, primarily because 
speech has a recognizable point sound source while background noise tends to 
emanate from multiple non-point sources or from locations other than the 
speaker point source. Spatial sound allows isolation of the point sound source 
of speech or other audio signal. Also, by concentrating on a specific point 
sound source, a listener can isolate the speech of a single speaker even during 
periods of interruption or overtalk. 

In a preferred embodiment, the remote participant station 199 also 
includes a microphone 117 for picking up the audio speech signals of a remote 
participant. The speech signal from the microphone 117 is converted to digital 
signals by terminal adapter 129, compressed using compression unit 226, sent 
across ISDN facilities 150 using either B-channel, decompressed using 
decompression unit 227, converted back to analog by terminal adapter 128, and 
played through loudspeaker 107 in the conference room station 100, 

There will, however, be an echo effect due to a delay caused by the 
compression algorithms of the compression units 226, 224. A slight delay 
occurs when remote participant audio signals are compressed by compression 
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unit 226. At conference room station 100, the remote participant audio signals 
are decompressed and played through loudspeaker 107. Microphones 103, 105 
will pick up the remote participant audio signals as played and feed them back 
to remote participant station 199. Another slight delay occurs when 
compression unit 224 compresses the remote participant audio signals for 
feedback. The combined effect of these two compression-related delays, sound 
transfer delays between the loudspeaker 107 and microphones 103, 105, and 
any other delays may be perceptible by the remote participant. Adaptive or 
nonadaptive echo cancellation techniques may be used to reduce echoes 
resulting from compression delays and other time delays. 

To improve the sound quality from the remote participant station 199, 
a second microphone may be used to capture stereo sound signals, and the 
stereo microphone signals could be sent across ISDN facilities using both B* 
channels. Stereo signals from remote participate station 199 may be played at 
conference room station 100 either on stereo loudspeakers for positioning close 
to the ears of each participant or on stereo headsets. The stereo loudspeakers 
may be positioned in chairs at the conference room station. Using stereo 
loudspeakers for each participant reduces the need for echo cancellation 
techniques, because the dummy head 101 should not pick up much feedback 
from the stereo loudspeakers for positioning close to the ears of the 
participants. The use of stereo headsets by each participant in conference 
room station 100 should eliminate the need for echo cancellation. 

The remote participant station 199 may also include a head-tracking 
sensor 119. A head-tracking sensor can detect movements of a remote 
participant such as the pan and tilt of a remote participant's head. A sensor, 
such as one manufactured by Polhemus Navigation Sciences Division of 
McDonnell Douglas of Colchester, Vermont, mounted on a headband can sense 
the movement of a head in the pan, tilt, and rotate axes. This movement 
information can be processed using converter 163 and transmitted across ISDN 
facilities 150 using the 16 kbs D chaimel along with call signaling information. 
At the conference room station 100, the D channel may be connected to 
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converter 143 and then to a pan/tilt motorized unit 109 for controlling the 
dummy head 101. Thus, the dummy head may track the movements of a 
remote participant's head. Other servo arrangements may be utilized to 
replicate a remote participant's head orientation using the dummy head. This 
5 head-tracking feature greatly increases the spatialization ability of a remote 
participant. By directing the movement of the dummy head to face a particular 
speaker in conference room station 100, the remote participant can isolate and 
understand the speech of that participant, even if that participant is speaking 
very softly. Movement of the right and left microphones also provides 

10 additional spatialization cues to the remote listener to aid in locating and 
understanding participants within the conference room station. 

Note that noise reduction may be useful to decrease the effects of any 
unwanted noise produced by motorized unit 109. A noise cancellation unit that 
cancels sound at the frequencies produced by the motorized unit should 

15 prevent the remote participant from hearing the motorized unit each time the 
remote participant moves. Other noise reduction methods may be available, 
such as placing the motorized unit a certain distance from the dummy head 
101 and tising a quiet belt drive to move the dummy head. 

The movement of the dummy head increases the virtual presence of a 

20 remote participant at the conference room station 100. Thus, participants in 
the conference room station speak directly towards the dxmimy head when they 
wish to address a remote participant. The virtual presence may further be 
increased by adding a video component. A video camera 175 may be placed 
near the dummy head, preferably at the location of the eyes, to transmit 

25 images of the conference room station 100 to the remote participant station 
199 across a network. The remote participant station may include a head- 
mounted display 176 to present the video image to a remote participant. Other 
displays or monitors, however, may be used. The video component may be 
added to the spatial sound conference system if bandwidth is available across 

30 the network, or if a video transmission cable connects the two stations 



175034 



-9- 




Attorney Docket No. 414,013 

together. Data compression algorithms such as ISO MPEG may be used to 
conform to the bandwidth Umitations of the communication network, if needed. 
An AIN 115 may overlay ISDN facilities 150 and allow participants to 
schedule a conference time, promote secure communications using caller 
5 identification information transmitted on the D-channel including voice 
recognition and passwords, or select a preferred dummy head configuration. 
AIN may have intelligent peripherals to enhance features of the spatial sound 
conference system by announcing new conference room station participants and 
demonstrating their virtual location as they join the spatial sound conference. 

10 Bellcore protocol 1129, or another protocol, may be used to establish a 
communication link between the intelligent peripheral and other machines in 
the AIN. An intelligent peripheral, such as a speech synthesizer or live 
operator, could make announcements emanating from a selected virtual 
location, such as directly above the remote participant. Also, a text intelligent 

15 peripheral could be used to display the name of each new participant on a 
computer monitor or an ISDN telephone display. Additionally, information on 
the D-channel could be used to create a computer display showing the 
conference table 120 and the names and faces of the conference room station 
participants. 

20 AIN could also provide a private link to an intelligent peripheral so that 

a remote participant station 199 could request information, such as a list of 
present conference participants or the time that the conference started, using 
a telephone ke3^ad or computer. Such information requests could result in an 
annoimcement from the selected virtual location heard only by the requestor. 

25 AIN features could be used in conjunction with not only a telephone keypad or 
a computer, but also a facsimile machine, or other electronic equipment. AIN 
features, such as those described, may be available in each embodiment of the 
spatial sound conference system. 

Figure lA shows a schematic of a many-to-many spatial sound 

30 conference using two dunmiy heads in two conference rooms. Like Figure 1, 
a conference room station 100 has a dummy head 101 with spatially disposed 
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microphones 103, 105 connected to a terminal adapter 128 which is in turn 
connected to ISDN facilities 150 with ISDN switches 140, 160. Conference 
room station 170 is configured similarly with another dummy head 102 having 
spatially disposed microphones 104, 106 connected to another terminal adapter 
5 129, which in turn in connected to ISDN facilities 150. Preferably, participants 
in one conference room 100 are positioned in a line (or semicircle) around 
conference table 120 to one side of the first dummy head 101, and participants 
in the other conference room 170 are positioned in a line (or semicircle) around 
conference table 121 to the other side of the second dummy head 102. Thus, 
10 each participant will have a unique virtual location during the spatial sound 
Q conference. 

Ji; Instead of having a single loudspeaker broadcasting audio signals from 

fl the remote location, each participant in conference room stations 100, 170 has 

2 left and right spatial loudspeakers 113, 115. Preferably loudspeakers 113, 115 

fl 15 are located in a chair and positioned close to the participants ears to enable 
i participants in the same conference room to hear each other directly. Spatial 

loudspeakers enable the spatial sound signals picked up by the dxmmiy head 
Q in the remote conference room station to be properly replayed to impart spatial 

% location cues. Like in the one-to-many spatial sound conference embodiment 

W 20 of Figure 1, standard compression algorithms and compression and 
decompression units 224, 225, 226, 227 may be used to conform the audio 
signals to the available bandwidth and AIN 155 may be used to provide 
enhanced features to the spatial sound conference. Echo cancellation could 
also be useful in this embodiment. Thus, a many-to-many spatial sound 
25 conference may be implemented using two dummy heads in two conference 
room stations. 

Figure 2 shows a schematic of a spatial sound conference bridge used in 
a spatial sound conference system. As an alternative to use of a dummy head 
to capture spatialized sound components for right and left audio signals, a 
30 spatial sound conference bridge may be used to convolve head-related transfer 
fxmctions with a monaural signal to create spatial signals. In a teleconference 
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situation with single participants at multiple sites, a spatial sound conference 
bridge 200 containing a head-related transfer function unit 205 can be used to 
create a spatial sound conference system. In a preferred embodiment, the 
spatial sound conference bridge 200 receives a digital monaural signal via 
5 either B-channel from each conference participant station at ports 201, 202, 
203, 204 connected to ISDN lines having 2 B-channels and a D-channel. The 
monaural signal may be either compressed or uncompressed depending upon 
the available bandwidth. If the incoming monaural signal is compressed, 
individual decompression unit 225 could be used to decompress the incoming 

10 signal. If one or more participants does not have a digital line to the 
conference bridge, an A/D conversion unit 230 in spatial sound conference 
bridge 200 could be used to digitize the incoming signal in preparation for 
convolution by the head-related transfer function unit 205. 

A spatial sound conference bridge can accommodate as many 

15 participants as are necessary, simply by providing more ports. Also, a port 260 
may be provided for a telephone operator. Based on which ports of the spatial 
sound conference bridge are active during a particular conference, the spatial 
sound conference bridge assigns a unique virtual location for each participant. 
The virtual locations of the conference participants could simulate the 

20 participants seated around a circular table. Other configurations could 
simulate the participants in a line, in a semicircle, or around a rectangular 
table. 

In a preferred embodiment, depending on the virtual location of a 
participant, the spatial sound conference bridge selects a head-related transfer 

25 function relating to the relative virtual position of each participant. The head- 
related transfer function unit 205 processes the monaural signal from a 
participant and creates two new sound signals, one for each ear of a listener. 
The head-related transfer function unit 205 can be a signal processor, such as 
the Convolvotron available from Crystal River Engineering in Palo Alto, 

30 California. The two new sound signals combined create a spatiaUzed sound 
signal. For example, the head-related transfer function imparting a 
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spatialization of "two locations to the right" may be apphed to the signal from 
port 201. The head-related transfer function imparting a spatialization of "one 
location to the right" may be applied to the signal from port 202, 
Correspondingly, the head-related transfer function of "one location to the left" 
may be applied to the signal from port 203, and the head-related transfer 
function of "two locations to the left" may be applied to the signal from port 
204, 

Once the head-related transfer function unit 205 has imparted the 
appropriate spatializations to the monaural signals from each participant, the 
spatial sound conference bridge compiles a composite signal for a particular 
participant station by combining the spatialized sound signals corresponding 
to all of the other participant stations. All composite signals do not need to be 
spatially consistent with each other as long as each composite signal spatially 
places the audio signals for each of the other participants. Thus, the composite 
signal sent from port 201 has spatialized sound signals based on the monaural 
signals from ports 202, 203, and 204. Similarly, the composite signal sent from 
port 202 has spatialized sound signals based on the monaural signab from 
ports 201, 203, and 204. Each composite signal is then sent to the proper 
participant station from ports 201, 202, 203, 204. 

The outgoing composite signals maybe compressed by compression unit 
224 and transmitted to the participants via both B-channels. When received 
at each participant station, the composite signals are decompressed and played 
to a participant using spatially positioned loudspeakers. If a participant does 
not have a digital connection to the conference bridge, the conference bridge 
may also convert the outgoing composite signals to that participant using D/A 
conversion unit 231 before transmitting the composite signals. 

With this method, the virtual locations of the conference participants 
may be different from the perspective of each participant. Figures 2A, 2B, 2C, 
and 2D show an example of virtual positions of conference participants A, B, 
C, and D around a round conference table from the perspective of each 
individual participant. In Figure 2A, the perspective of participant A is at the 
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head of the virtual conference table. Participant B's virtual position is one seat 
to the right, participant C's virtual position is one seat to the left, and 
participant D's virtual position is two seats to the left. In Figure 2B, the 
perspective of participant B is at the head of the virtual conference table. 
5 Participant A*s virtual position is two seats to the right, and participants C 
and D have the same virtual position as in Figure 2A. In Figures 2C and 2D, 
it can be seen that this method of virtually seating the participants results in 
minimal signal processing at the spatial sound conference bridge. 

Figures 2E, 2F, 2G, and 2H show that the spatial sound conference 

10 bridge may also be used to establish consistent positions from the perspective 
of each conference participant at the cost of higher signal processing 
requirements. Head-related transfer functions may place participant B one 
seat to the right of participant A, participant D one seat to the left of 
participant A, and participant C across from participant A - from the 

15 perspective of all participants. 

The spatial sound conference bridge may also have a variety of 
additional features such as adaptive or nonadaptive echo cancellation to reduce 
the effects of compression delays and other delays, reverberation settings to 
simulate various virtual room acoustics, or audio technique algorithms such as 

20 speaker crossover cancellation to optimize playback on spatially disposed 
loudspeakers as opposed to a headset. 

Figure 3 shows a schematic of a spatial sound conference system 
implemented using a spatial sound conference bridge across ISDN facilities. 
A teleconference using this system links at least two participant stations. Each 

25 participant station 310 has right and left spatially disposed loudspeakers 303, 
305 and a microphone 307. The right and left loudspeakers may be a stereo 
headset or loudspeakers for positioning close to the ears of a conference 
participant. A monaural audio signal from each participant station is picked 
up by microphone 307 and transmitted to a computer processor 320. 

30 Preferably, this processor is unobtrusively integrated into the participant 
station. This processor includes terminal adapter 328 which converts the 
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monaural analog signal to a digital signal and compression A unit 324 for 
compressing the digital microphone signal. Conventional compression 
algorithms, such as ISO MPEG Layer II or III, may be used. Alternatively, 
compression may be omitted if enough bandwidth is available to transmit the 
5 uncompressed signal to the digital network. 

Each participant station is connected to an ISDN switch 340, 360 that 
is part of ISDN facilities 350. A spatial sound conference bridge 370 is 
included in this configuration to impart a head-related transfer function to the 
monaural signal from each participant station. The spatial sound conference 

10 bridge 370 can be placed virtually anywhere in this configuration, such as 
connected to an ISDN switch 340, 360, connected at another ISDN location 
350, or connected to a participant station 310. 

An AIN 355 may overlay ISDN and may use information transmitted on 
the D-channel to allow participants to schedule a conference time, recreate a 

15 particular conference room setting using acoustic and reverberation 
information, select a preferred virtual conference table size and shape, reserve 
a particular position at a virtual conference table, select a spatial soxmd 
conference bridge based on availability or cost, or handle the connection and 
disconnection of conference participants. Also, because certain head-related 

20 transfer functions may produce better spatial separation for different 
conference participants, AIN may be used to construct or select a preferred 
head-related transfer function for an individual participant. 

As described in the previous embodiments, AIN may vise caller 
identification to promote secure access to a spatial sound conference. AIN 

25 intelligent peripherals can announce new participants and demonstrate each 
new participant's virtual location. Also, a text intelligent peripheral could be 
used to display the name of each new participant on a computer monitor or 
ISDN telephone display or create a computer display showing the virtual 
conference table and the names and faces of the conference participants. AIN 

30 could provide private links to an intelligent peripheral so that a participant 
could request information from the intelligent peripheral using a telephone 
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keypad or computer. Such information requests could result in an 
announcement from the selected virtual location heard only by the requestor. 

Digital monaural signals from each participant station 310 are 
transferred through ISDN facilities to a spatial sound conference bridge 370, 
5 wherever it is located in the system. If the digital monaural signals are 
compressed, the spatial sound conference bridge decompresses the monaural 
signals using a decompression A unit 334. Then, depending on the port of an 
incoming signal, the spatial sound conference bridge imparts a head-related 
transfer function to the signal to create a pair of spatial sound signals using 

10 head-related transfer-function unit 335. See Figure 2 and accompanying 
description for a detailed explanation of the operation of the spatial sound 
conference bridge. The spatial sound conference bridge then compiles 
composite signals and compresses them using a compression B unit. 
Preferably, both compression A and compression B would use the ISO MPEG 

15 Layer II or III compression algorithm, however, compression A and 
compression B could be two different compression algorithms. Compression 
B may compress two spatialized audio channels or it may derive the difference 
between the two channels, thus allowing transmission of a single channel and 
a difference signal with or without further compression. Once the signals are 

20 compressed, the spatial sound conference bridge transmits the composite 
signals through the ports and directs these composite signals to the proper 
participant station. 

At participant station 310, the composite signal is received and 
decompressed, using decompression B unit 326, into its constituent right and 

25 left spatial sound signals. These signals are converted to analog using terminal 
adapter 328 and sent to the left and right spatially disposed loudspeakers 303, 
305 in the participant station 310. The compression, decompression, 
spatialization, and compilation may be carried out at various locations across 
the network or conference, depending on desired allocation and location of 

30 processing resources and transmission bandwidth. 
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Thus, the spatial sound conference system uses head-related transfer 
functions to impart spatial qualities to a teleconference implemented across a 
network. Sound spatialization may be imparted using a dummy head at a 
transmitting station, a spatial sound conference bridge, or a HRTF unit at a 
receiving station. This invention may, of course, be carried out in specific ways 
other than those set forth here without departing from the spirit and essential 
characteristics of the invention. Therefore, the presented embodiments should 
be considered in all respects as illustrative and not restrictive, and all 
modifications falling within the meaning and equivalency range of the 
appended claims are intended to be embraced therein. 
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CLAIMS 

We claim: 



1 1. A spatial sound conference system comprising: 

2 a conference station comprising: 

3 right and left spatially disposed microphones connected to a 

4 communication channel for receiving right and left audio signals, wherein the 

5 differences between the right and left audio signals represent a head-related 

6 transfer function; and 

7 a remote station comprising: 

8 right and left spatially disposed loudspeakers connected to the 

9 communication channel. 

1 2. A spatial sound conference system according to claim 1, further 

2 comprising: 

3 a compression unit connected to the right and left spatially disposed 

4 microphones for compressing the right and left audio signals; and 

5 a decompression unit connected to the right and left spatially disposed 

6 loudspeakers for decompressing the compressed right and left audio signals. 

1 3. A spatial sound conference system according io claim 1, further 

2 comprising: 

3 a microphone positioned in the remote station and connected to the 

4 communication channel for receiving an audio signal; and 

5 a loudspeaker positioned in the conference station and connected 

6 through the conmiunication channel to the microphone. 
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1 4. A spatial sound conference system according to claim 3, further 

2 comprising: 

3 a compression unit connected to the microphone positioned in the 

4 remote station for compressing the audio signal; and 

5 a decompression unit connected to the loudspeaker positioned in the 

6 conference station for decompressing the compressed audio signal. 

1 5. A spatial sound conference system according to claim 1, wherein the 

2 right and left spatially disposed microphones are positioned on a dummy head. 

1 6. A spatial sound conference system according to claim 5, further 

2 comprising: 

3 a microphone positioned in the remote station and connected to the 

4 communication channel for receiving an audio signal; and 

5 a loudspeaker positioned proximal to the dummy head and connected 

6 through the communication channel to the microphone. 

1 7. A spatial sound conference system according to claim 5, further 

2 comprising: 

3 a microphone positioned in the remote station and connected to the 

4 communication channel for receiving an audio signal; and 

5 right and left spatially disposed loudspeakers positioned in the 

6 conference station and connected through the communication channel to the 

7 microphone. 

1 8. A spatial sound conference system according to claim 6, further 

2 comprising: 

3 a head-tracking sensor in the remote station connected to the 

4 communications channel; and 

5 a position simulator attached to the dinnmy head and connected through 

6 the communication channel to the sensor. 
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1 9. A spatial sound conference system according to claim 1, further 

2 comprising: 

3 a video camera positioned in the conference station and connected to the 

4 communication channel for receiving a video image; and 

5 a display positioned in the remote station and connected through the 

6 communication channel to the video camera, 

1 10. A spatial sound conference system according to claim 9, wherein the 

2 video camera is positioned near the location of eyes on a dummy head. 

1 11, A spatial sound conference system according to claim 9, wherein the 

2 display is a head-mounted display. 

1 12. A spatial sound conference system according to claim 1, wherein the 

2 right and left spatially disposed loudspeakers are a headset. 

1 13. A method for conducting a spatial sound conference comprising the steps 

2 of: 

3 converting audio information into right and left audio signals at a 

4 conference station, wherein the conversion imparts a differential characteristic 

5 to the right and left audio signals, and the differential characteristic is 

6 represented by a head-related transfer function, and the right and left audio 

7 signals comprise spatialized audio; 

8 transmitting audio information representative of said spatialized audio 

9 from the conference station across a communication channel to a remote 

10 station; and 

11 playing the spatialized audio in the remote station, 

1 14. A method for conducting a spatial sound conference according to claim 

2 13, further comprising the steps of: 
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3 compressing the right and left audio signals after the step of converting; 

4 and 

5 decompressing the compressed right and left audio signals after the step 

6 of transmitting. 

1 15, A spatial sound conference system comprising: 

2 a transmitting station comprising: 

3 a microphone connected to a communications system for receiving 

4 an audio signal; 

5 a head-related transfer function unit connected to the communications 

6 system for imparting a head-related transfer function to the audio signal to 

7 produce a spatialized audio signal; and 

8 a receiving station comprising: 

9 right and left spatially disposed loudspeakers connected to the 
10 communication system for receiving the spatialized audio signal. 

1 16. A spatial sound conference system according to claim 15, further 

2 comprising: 

3 a compression unit connected to the microphone for compressing the 

4 audio signal; and 

5 a decompression unit connected to the head-related transfer function 

6 unit for decompressing the compressed audio signal. 

1 17. A spatial sound conference system according to claim 15, further 

2 comprising: 

3 a compression unit connected to the head-related transfer function unit 

4 for compressing the spatialized audio signal; and 

5 a decompression unit connected to the right and left spatially disposed 

6 loudspeakers for decompressing the compressed spatialized audio signal. 
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1 18. A spatial sound conference system according to claim 15, wherein the 

2 head-related transfer function unit is contained in a spatial sound conference 

3 bridge. 

1 19. A method for conducting a spatial sound conference comprising the steps 

2 of: 

3 receiving an audio signal at a transmitting station; 

4 transmitting the audio signal from the transmitting station to a spatial 

5 sound conference bridge; 

6 imparting a head-related transfer function to the audio signal to create 

7 a spatialized audio signal; 

8 sending the spatialized audio signal from the spatial sound conference 

9 bridge to a receiving station; and 

10 playing the spatialized audio signal on spatially disposed loudspeakers 

11 at the receiving station. 

1 20. A method for conducting a spatial sound conference according to claim 

2 19, further comprising the steps of: 

3 compressing the audio signal after the step of receiving; and 

4 decompressing the compressed audio signal after the step of 

5 transmitting. 

1 21. A method for conducting a spatial sound conference according to claim 

2 19, further comprising the steps of: 

3 compressing the spatialized audio signal after the step of imparting; and 

4 decompressing the compressed spatialized audio signal after the step of 

5 sending. 

1 22. A method for conducting a spatial sound conference comprising the steps 

2 of: 

3 receiving an audio signal at a transmitting station; 
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4 transmitting the audio signal from the transmitting station to a 

5 receiving station; 

6 imparting a head-related transfer function to the audio signal to create 

7 spatialized audio signal; 

8 playing the spatialized audio signal on spatially disposed loudspeakers 

9 in the receiving station. 

1 23. A method for conducting a spatial sound conference according to claim 

2 22, further comprising the steps of: 

3 compressing the audio signal after the step of receiving; and 

4 decompressing the compressed audio signal after the step of 
K transmitting. 

4 \ ^24. A spatial sound conference bridge comprising: 

a ^^S----^^ l^ast two input ports for receiving at least two^audjo-^^ 

J^^'^'''^''^^^^ a head-related transfer function unitconaecteHto the at least two input 

4 ports for imparting a head-rel^fcedr^^ri^fer function to at least one received 

5 audio signal to prodiiee-ttTleast one spatialized audio signal; and 

6 atleasftwo output ports connected to the head-related transfer function 
7^^..--i3nitfor transmitting the spatiahzed audio signal. 

1 25. A spatial sound conference bridge according to claim 24, further 

2 comprising: 

3 a decompression unit connected to at least one input port for 

4 decompressing at least one audio signal. 

1 26. A spatial sound conference bridge according to claim 24, further 

2 comprising: 

3 a compression unit connected to at least one output port for compressing 

4 at least one spatialized audio signal. 
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1 27. A method for conducting a spatial sound conference comprising the steps 

2 of: 

3 receiving at least two monaural audio signals; 

4 generating at least two sets of spatialized audio signals from the at least 

5 two monaural audio signals using at least two head-related transfer functions; 

6 compiling at least one composite signal from the at least two sets of 

7 spatialized audio signals; 

8 transmitting at least one composite signal to a location; and 

9 playing at least one composite signal at the location. 
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ABSTRACT OF THE DISCLOSURE 



The spatial sound conference system enables participants in a 
teleconference to distinguish between speakers even during periods of 
interruption and overtalk, identify speakers based on spatial location cues, 
understand low volume speech, and block out background noise using spatial 
sound information. Spatial sound information may be captured using 
microphones positioned at the ear locations of a dummy head at a conference 
table, or spatial sound information may be added to a participant's monaural 
audio signal using head-related transfer functions. Head-related transfer 
functions simulate the frequency response of audio signals across the head 
from one ear to the other ear to create a spatial location for a sound. Spatial 
sound is transmitted across a communication channel, such as ISDN, and 
reproduced using spatially disposed loudspeakers positioned at the ears of a 
participant. By inserting a spatial sound component in a teleconference, a 
speaker other than the loudest speaker may be heard during periods of 
interruption and overtalk. Additionally, speakers may be more readily 
identified when they have a spatial sound position, and the perception of 
backgrotuid noise is reduced. 
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